# Video Text Understanding
Vica2 Init
Apache-2.0
ViCA2 is a multimodal vision-language model focused on video understanding and visual-spatial cognition tasks.
Video-to-Text
Transformers English

V
nkkbr
30
0
Vica2 Stage2 Onevision Ft
Apache-2.0
ViCA2 is a 7B-parameter multimodal vision-language model focused on video understanding and visual-spatial cognition tasks.
Video-to-Text
Transformers English

V
nkkbr
63
0
Videochat R1 Thinking 7B
Apache-2.0
VideoChat-R1-thinking_7B is a multimodal model based on Qwen2.5-VL-7B-Instruct, focusing on video-text-to-text tasks.
Video-to-Text
Transformers English

V
OpenGVLab
800
0
Videochat TPO
MIT
A multimodal large language model developed based on the paper 'Task Preference Optimization: Improving Multimodal Large Language Models through Visual Task Alignment'
Text-to-Video
Transformers

V
OpenGVLab
18
5
Featured Recommended AI Models